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REMARKS 

In response to the above-identified Final Office Action, Applicant amends the Application 
and seeks reconsideration in view of the following remarks. In this Response, Applicant amends 
claims 1-13 and 15-17, and adds claims 18-21. Applicant does not cancel any claims. Accordingly, 
claims 1-13 and 15-21 are pending in the Application. 

I. Amendments to the Specification 

Applicant submits a substitute specification. The substitute specification amends the 
originally-filed specification by capitalizing the trademarked term INFINIBAND and Applicant's 
specification at paragraph [0019] provides generic terminology for the term "INFINIBAND" in 
accordance with MPEP § 608.0 l(v). The substitute specification also includes amendments to 
correct other inadvertent typographical and grammatical errors. Applicant submits that no new 
matter is added by the amendments to the specification. 

II. Claims Rejected Under 35 U.S.C. S 112 

Claim 8 stands rejected under 35 U.S.C. § 112, second paragraph, as being allegedly 
indefinite. Specifically, claim 8 recites the trademarked term "INFINIBAND" {see Paper No ./Mail 
Date 20071105 , page 3). Applicant has deleted the term "INFINIBAND" from claim 8 and 
respectfully requests withdrawal of the rejection of claim 8. 

Notably, Applicant has added new claims 1 8-20, which each recite the term "INFINIBAND" 
switch. MPEP § 608.01(v) states that if a "trademark has a fixed and definite meaning, it constitutes 
sufficient identification unless some physical or chemical characteristic of the article or material is 
involved in the invention." Applicant submits that the one skilled in the art attaches a fixed and 
definite meaning to the term "INFINIBAND." That is, the term "INFINIBAND" has been adopted 
by the art as a term having a specific meaning and is, therefore, more than merely a trademarked 
term. Therefore, Applicant submits that the term "INFINIBAND" is a definite term. 
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III. Double Patenting 

Claims 8-12 stand provisionally rejected on the ground of non-statutory obviousness-type 
double patenting over claims 5-9 of co-pending U.S. Application No. 10/721,213. In addition, 
claims 8-13 and 15-17 are provisionally rejected on the ground of non-statutory obviousness-type 
double patenting over claims 9-14 and 16-18 of co-pending U.S. Application No. 10/722,022. 
Applicant notes these provisional rejections are provisional rejections and will respond if the 
rejections are finalized. 

IV. Claims Rejected Under 35 U.S.C. § 102 

Claims 1 and 3 stand rejected under 35 U.S.C. § 102(b) as being anticipated by U.S. Patent 
No. 6,400,681 issued to Bertin et al. {''Bertin"). Applicant respectfully traverses the rejection, at 
least in view of the amendments to independent claim 1 . 

To anticipate a claim, the cited reference must disclose each and every element of the 
rejected claim (see MPEP § 2131 1 Amended claim 1 defines, inter alia, a connection controller for 
a network comprising a plurality of second stage switches coupled to each of a plurality of first stage 
switches , the connection controller comprising a packing algorithm circuit configured to compute an 
actual traffic pattern for the packet based on a received network topology data and a received traffic 
pattern request, wherein the actual traffic pattern comprises one of the plurality of first stage 
switches and one of the plurality of second stage switches such that the network is able to operate as 
a strictly non-interfering network . Applicant submits that Bertin fails to disclose at least these 
elements of claim 1. 

In making the rejection, the Patent Office alleges Bertin discloses: 
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a packing algorithm coupled to receive a requested traffic pattern and 
to compute an actual traffic pattern of a packet, wherein the packing 
algorithm computes and actual traffic pattern using the network 
topology data and the requested traffic pattem such that the network 
operates as a strictly non-interfering network. (Paper No./Mail Date 
20071105 . page 5, citing Bertin Col. 5, lines 63-65 and FIG. 7). 

Applicant disagrees with the Patent Office's characterization of the disclosure in Bertin. 

As discussed above, claim 1, as amended recites that the network comprises "a plurality of 
second stage switches coupled to each of the plurality of first stage switches," and "wherein the 
actual traffic pattem comprises one of the plurality of first stage switches and one of the plurality of 
second stage switches such that the network is able to operate as a strictly non-interfering network." 
In making the rejection, the Patent Office does not cite Bertin as disclosing such elements. 
Moreover, in reviewing Bertin, Applicant is unable to discern any sections of Bertin disclosing at 
least "a plurality of second stage switches coupled to each of the plurality of first stage switches," 
and "wherein the actual traffic pattem comprises one of the plurality of first stage switches and one 
of the plurality of second stage switches such that the network is able to operate as a strictly non- 
interfering network," as recited in claim 1. That is, Applicant submits that Bertin''^ "method and 
process for minimizing the time to select an optimal routing path between an origin and a destination 
node in large communication networks" is different than the connection controller defined in claim 1 
{Bertin, Col. 1, lines 7-10). 

Applicant submits that Bertin discloses a packet switching network having eight 
interconnected nodes {see reference numerals 201-208 in FIG. 2), wherein each node includes a 
packet switch {see reference numeral 302 in FIG. 3) for routing packets through the network via one 
another {see Bertin, Col. 7, lines 8-12 and Col. 8, lines 2-16). Specifically, and with reference to 
FIG. 2 of Bertin, Bertin' s nodes 201-208 (packet switches 302) are randomly interconnected, which 
prevents nodes 201-208 from operating as a strictly non-interfering network, or by definition, a 
network where "competing traffic sources do not attempt to use the same resources at the same time" 
by using dedicated resources (Applicant's specification, paragraph [0025]). That is, because of the 
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manner in which nodes 201-208 interconnected (i.e., randomly interconnected), nodes 201-208, at 
least in some instances, will compete with one another for network resources since there are not 
enough resources (e.g., links, switches, nodes, etc.) for creating a dedicated path between each 
source node and each destination node. In other words, at least two paths between sources and 
destinations in Bertin's network include a common link, switch, and/or node. 

By contrast, and with reference to Applicant's FIGS. 1 -3, a plurality of second stage switches 
(e.g., second stage switches 118 (see FIG. 1), 218 (see FIG. 2), and 318 (see FIG. 3)) coupled to 
each of a plurality of first stage switches (e.g., first stage switches 116 (see FIG. 1), 216 (see FIG. 2), 
and 350 (see FIG. 3)) do not compete with one another for network resources. That is, a two-stage 
switching topology provides enough resources that each path between each source and each 
destination docs not have to share resources. Therefore, Bertin fails to disclose each and every 
element of claim 1 . 

The failure of Bertin to disclose each and every element of claim 1 is fatal to the anticipation 
rejection. Therefore, claim 1 is not anticipated by Bertin. Accordingly, Applicant respectfully 
requests withdrawal of the rejection of independent claim 1. 

Claim 3 depends from claim 1 and includes all of the elements thereof Therefore, Applicant 
submits that claim 3 is not anticipated by Bertin at least for the same reasons as claim 1 , in addition 
to its own features. Accordingly, Apphcant respectfiiUy requests withdrawal of the rejection of 
claim 3. 

V. Claims Rejected Under 35 U.S.C. S 103 
A. Bertin in view of Brahmaroutu 

Claims 2, 4-6, 8, 10-13, and 16-17 stand rejected under 35 U.S.C. § 103(a) as being obvious 
over Bertin in view of U.S. Patent Application Publication No. 2003/0033427 Al filed by 
Brahmaroutu ("Brahmaroutu ") . Applicant respectfiilly traverses the rejection, at least in view of the 
amendments to independent claims 1,8, and 13. 
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To render a claim obvious, the cited references must teach or suggest each and every element 
of the rejected claim {see MPEP_§_2143). Claims 2 and 4-6 depend from claim 1 and include all of 
the elements thereof. Furthermore, Applicant submits that claims 8, 10-13, and 16-17 each recite 
elements similar to the elements of independent claim 1 discussed above with respect to the 
anticipation rejection based on Bertin. In rejecting claims 2, 4-6, 8, 10-13, and 16-17, the Patent 
Office characterizes the disclosure in Bertin similar to the anticipation rejection of claim 1 discussed 
above. Applicant has discussed above the failure of Bertin to disclose at least the elements of, "a 
plurality of second stage switches coupled to each of the plurality of first stage switches," and 
"wherein the actual traffic pattem comprises one of the plurality of first stage switches and one of 
the plurality of second stage switches such that the network is able to operate as a strictly non- 
interfering network," as recited in claims 2 and 4-6 (via claim 1) and similarly recited in claims 8, 
10-13, and 16-17, and submits that such discussion in equally apphcable to an obviousness rejection 
of claims 2, 4-6, 8, 10-13, and 16-17 based on Bertin. Therefore, Bertin fails to teach or suggest 
each and every element of claims 2, 4-6, 8, 10-13, and 16-17. The Patent Office relies on the 
disclosure in Brahmaroutu to cure the defects of Bertin; however. Applicant submits that 
Brahmaroutu fails to cure such defects. 

In making the rejection, the Patent Office alleges that Brahmaroutu discloses "an InfiniBand 
switch that routes packets based on the information in a forwarding table," and that "every switch 
and each port may have one or more Local Identifiers (LIDs)" ( Paper No./Mail Date 20071 105 . 
pages 1 1 and 12, respectively, citations omitted). The Patent Office does not cite Brahmaroutu as 
disclosing the elements of, "a plurality of second stage switches coupled to each of the plurality of 
first stage switches," and "wherein the actual traffic pattem comprises one of the plurality of first 
stage switches and one of the plurality of second stage switches such that the network is able to 
operate as a strictly non-interfering network," as recited in claims 2 and 4-6 (via claim 1) and 
similarly recited in claims 8,10-13, and 16-17. Moreover, in reviewing Brahmaroutu, Applicant is 
unable to discern any sections of Brahmaroutu as disclosing such elements. Therefore, 
Brahmaroutu fails to cure the defects of Bertin. 
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The failure of the combination of Bertin and Brahmaroutu to teach or suggest each and every 
element of claims 2, 4-6, 8, 10-13, and 16-17 is fatal to the obviousness rejection. Therefore, claims 
2, 4-6, 8, 10-13, and 16-17 are not obvious over Bertin in view of Brahmaroutu. Accordingly, 
Applicant respectfully requests withdrawal of the rejection of claims 2, 4-6, 8, 10-13, and 16-17. 

B. Bertin in view of Yang 

Claim7 stands rejected under 35 U.S.C. § 103(a) as being obvious over Bertin in view of 
U.S. Patent No. 5,940,389 issued to Yang et al. ("Fang"). Applicant respectfully tiaverses the 
rejection, at least in view of the amendments to independent claim 1 , from which claim 7 depends. 

To render a claim obvious, the cited references must teach or suggest each and every element 
of the rejected claim {see MPEP ^ 2143 ). Claim 7 depends from claim 1 and includes all of the 
elements thereof In rejecting claim 7, the Patent Office characterizes the disclosure in Bertin 
similar to the anticipation rejection of claim 1 discussed above. Applicant has discussed above the 
failure of Bertin to disclose at least the elements of, "a plurality of second stage switches coupled to 
each of the plurality of first stage switches," and "wherein the actual fraffic pattern comprises one of 
the plurality of first stage switches and one of the plurality of second stage switches such that the 
network is able to operate as a strictly non-interfering network," as recited in claim 7 (via claim 1), 
and submits that such discussion in equally applicable to an obviousness rejection of claim 7 based 
on Bertin. Therefore, Bertin fails to teach or suggest each and every element of claim 7. The Patent 
Office relies on the disclosure in Yang to cure the defects of Bertin; however. Applicant submits that 
Yang fails to cure such defects. 

In making the rejection, the Patent Office alleges that Yang discloses "a Benes Network, 
which is a special case of a CLOS network, [that] can be used as a switch fabric and that each node 
in the network has an entry, which is indexed by an identifier and contains information regarding 
how to fransmit received cells to the next node in the routing table" (Paper No ./Mail Date 20071 105 . 
page 15, citations omitted). The Patent Office does not cite Yang as disclosing the elements of, "a 
plurality of second stage switches coupled to each of the plurality of first stage switches," and 
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"wherein the actual trafRc pattern comprises one of the plurality of first stage switches and one of 
the plurality of second stage switches such that the network is able to operate as a strictly non- 
interfering network," as recited in claim 7 (via claim 1). Moreover, in reviewing Yang, Applicant is 
unable to discern any sections of Yang as disclosing such elements. Therefore, Yang fails to cure the 

defects of Bertin. 

The failure of the combination of Bertin and Yang to teach or suggest each and every element 
of claim 7 is fatal to the obviousness rejection. Therefore, claim 7 is not obvious over Bertin in view 
of Yang. Accordingly, Applicant respectfiilly requests withdrawal of the rejection of claim 7. 

C. Bertin in view of Brahmawutu and Yang 

Claims 9 and 15 stand rejected under 35 U.S.C. § 103(a) as being obvious ower Bertin in 
view of Brahmaroutu and Yang. Applicant respectfully traverses the rejection, at least in view of the 
amendments to independent claims 8 and 13, fi-om which claims 9 and 15 depend, respectively. 

To render a claim obvious, the cited references must teach or suggest each and every element 
of the rejected claim {see MPEP § 21431 Claim 9 depends fi-om claim 8 and includes all of the 
elements thereof, and claim 15 depends from claim 13 and includes all of the elements thereof In 
rejecting claims 9 and 15, the Patent Office characterizes the disclosures in Bertin, Brahmaroutu, 
and Yang similar to the various rejections discussed above. Applicant has discussed above the 
failure of Bertin, Brahmaroutu, and Yang to disclose elements that are similar to at least the 
elements of, "a plurality of second stage switches coupled to each of the plurality of first stage 
switches," and "wherein the actual traffic pattern comprises one of the plurality of first stage 
switches and one of the plurality of second stage switches such that the network is able to operate as 
a strictly non-interfering network," as recited in claims 9 and 15 (via claims 8 and 13, respectively), 
and submits that such discussion in equally applicable to an obviousness rejection of claims 9 and 1 5 
based on Bertin, Brahmaroutu, and Yang. Therefore, the combination of Bertin, Brahmaroutu, and 
Yang fails to teach or suggest each and every element of claims 9 and 15. Accordingly, Applicant 
respectfiilly requests withdrawal of the rejection of claims 9 and 15. 
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VI. Claim Amendments 

Claim 2-7, 9-12 and 15-17 have been amended so that various elements recited in these 
claims are consistent with their respective independent claims. 

VII. New Claims 

Applicant has added new claims 18-21. New claims 1 8 depends from claim 1 , new claim 1 9 
depends from claim 8, and new claims 20-21 depend from independent claims 1, 8, and 13, 
respectively, and include all of the elements of their respective independent claims. Therefore, in 
view of the various discussions above. Applicant submits that claims 18-21 are in condition for 
allowance. 

CONCLUSION 

In view of the foregoing, it is believed that all claims now pending are in condition for 
allowance. A Notice of Allowance is earnestly solicited at the earliest possible date. If the Patent 
Office believes that a telephone conference would be usefiil in moving the application forward to 
allowance, the Patent Office is encouraged to contact the undersigned at (480) 385-5060 or 

jgraff@ifllaw.com. 

If necessary, the Commissioner is hereby authorized to charge pajmient or credit any 
overpayment to Deposit Accotmt No. 50-209 1 for any fees required under 37 C.F.R. § § 1 . 1 6 or 1 . 1 7, 
particularly extension of time fees. 

RespectfiiUy submitted. 

Date December 28, 2007 /JASON R. GRAFF/ 

Jason R. Graff 
Reg. No. 54,134 

Attachments: Replacement Specification 
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MARKED-UP VERSION 



RELATED CASES 

[0001] Related subject matter is disclosed in U.S. patent application entitled "METHOD OF 

OPERATING A CLOS NETWORK" having application Ser. No. [[ ]] 10/722.048 and 

filed on the same date herewith and assigned to the same assignee. 

[0002] Related subject matter is disclosed in U.S. patent application entitled "INFINIBAND 

SWITCH OPERATING IN A CLOS NETWORK" having application Ser. No. [[ ]] 

10/722,213 and filed on the same date herewith and assigned to the same assignee. 

[0003] Related subject matter is disclosed in U.S. patent application entitled "STRICTLY NON- 
INTERFERING NETWORK" having application Ser. No. [[ ]] 10/722.022 and filed on 

the same date herewith and assigned to the same assignee. 

BACKGROUND OF THE INVENTION 

[0004] Current switching topologies for network operations can cause a network to suffer 
performance degradation due to latency. Significant delays fi-om latency can result fi-om queuing 
delays in network switches due to interference caused by competing traffic sources attempting to 
use the same network resources at the same time. This can cause packets to queue up in one or 
more switches and delay the packet's delivery to its destination. This increase in latency slows 
network response time and can result in lost packets and other disadvantageous network 
behavior. 

[0005] Accordingly, there is a significant need for an apparatus and method that overcomes the 
deficiencies of the prior art outlined above. 



BRIEF DESCRIPTION OF THE DRAWINGS 



[0006] Referring to the drawing: 

[0007] FIG. 1 depicts a network according to one embodiment of the invention; 

[0008] FIG. 2 depicts a network according to another embodiment of the invention; 

[0009] FIG. 3 depicts a network according to yet another embodiment of the invention; 

[0010] FIG. 4 depicts a block diagram of a network according to an embodiment of the 
invention; 

[0011] FIG. 5 illustrates a flow diagram of a method of the invention according to an 
embodiment of the invention; 

[0012] FIG. 6 illustrates a flow diagram of a method of the invention according to another 
embodiment of the invention; and 

[0013] FIG. 7 illustrates a flow diagram of a method of the invention according to yet another 
embodiment of the invention. 

[0014] It will be appreciated that for simplicity and clarity of illustration, elements shown in the 
drawing have not necessarily been drawn to scale. For example, the dimensions of some of the 
elements are exaggerated relative to each other. Further, where considered appropriate, reference 
numerals have been repeated among the Figures to indicate corresponding elements. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0015] In the following detailed description of exemplary embodiments of the invention, 
reference is made to the accompanying drawings that illustrate specific exemplary embodiments 



in which the invention may be practiced. These embodiments are described in sufficient detail to 
enable those skilled in the art to practice the invention, but other embodiments may be utilized 
and logical, mechanical, electrical and other changes may be made without departing from the 
scope of the present invention. The following detailed description is, therefore, not to be taken in 
a limiting sense, and the scope of the present invention is defined only by the appended claims. 

[0016] In the following description, numerous specific details are set forth to provide a thorough 
understanding of the invention. However, it is understood that the invention may be practiced 
without these specific details. In other instances, well-known circuits, structures and techniques 
have not been shown in detail in order not to obscure the invention. 

[0017] In the following description and claims, the terms "coupled" and "connected," along with 
their derivatives, may be used. It should be understood that these terms are not intended as 
synonyms for each other. Rather, in particular embodiments, "connected" may be used to 
indicate that two or more elements are in direct physical or electrical contact. However, 
"coupled" may mean that two or more elements are not in direct contact with each other, but yet 
still co-operate or interact with each other. 

[0018] For clarity of explanation, the embodiments of the present invention are presented, in 
part, as comprising individual fimctional blocks. The fimctions represented by these blocks may 
be provided through the use of software, or shared or dedicated hardware, including, but not 
limited to, hardware capable of executing software. The present invention is not limited to 
implementation by any particular set of elements, and the description herein is merely 
representational of one embodiment. 

[0019] FIG. 1 depicts a network 100 according to one embodiment of the invention. In an 
embodiment, network 100 can be implemented in one or more chassis in a backplane-type 
interconnect environment. In another embodiment, network 100 can be implemented on the same 
switching board or switching chip. Network 100 may utilize a packet data protocol for traffic 
movement among switches and end-node devices. For example, network ICQ may use InfiniBand 
INFINIBAND . InfiniBand INFINIBAND is specified by the InfiniBand" ^ INFINIBAND™ 



Architecture Specification, Release 1.1 or later, as promulgated by the InfiniBand™ 
INFINIBAND™ Trade Association, 5440 SW Westgate Drive, Suite 217, Portland, Ofegr 
Oregon 97221. As such, network 100 utilizes data packets having fixed or variable length, 

defined by the applicable protocol. 

[0020] The network 100 depicted in FIG. 1 includes first stage InfiniBand INFINIBAND 
switches 116 coupled to second stage InfiniBand INFINIBAND switches 118 by a plurality of 
links 115. In an embodiment, each of plurality of links 115 can be bi-directional. In an 
embodiment, plurality of links 115 operated under InfiniBand INFINIBAND can be Ix, 4x or 
12x speed links. In an embodiment, each of first stage InfiniBand INFINIBAND switches 116 
can be coupled to one or more of a plurality of end nodes 1 14. Each of plurality of end nodes 1 14 
can be, for example and without limitation, application servers, database servers, and the like. In 
an embodiment, each of plurality of end nodes 114 can act as a source (i.e. creating a packet and 
placing it in network 100), or a destination (an end point for a packet created by a source). In 
another embodiment, one or more of each of plurality of end nodes 1 14 can act as both a source 
for one packet, and as a destination for another packet. For example, source 122 can create a 
packet with a destination 126. In an embodiment, network 100 is a non-blocking network. 

[0021] In an embodiment, two or more first stage InfiniBand INFINIBAND switches 116 may 
be implemented within a single switching entity, for example a single switching chip, physical 
switching unit, and the like. Also, two or more of second stage InfiniBand INFINIBAND 
switches 118 may be implemented within a single switching entity. In yet another embodiment, 
two or more InfiniBand INFINIBAND switches may be functionally replaced with either a 
single InfiniBand INFINIBAND switch or a subnetwork with a non-blocking topology. In an 
exemplary embodiment of the invention, network 100 can be built using any number of 
InfiniBand INFINIBAND switches, where an InfiniBand INFINIBAND switch can be a 24-port 
Mellanox Anafa-II InfiniBand INFINIBAND Switch, manufactured by Mellanox Technologies^ 
Inc. . 2900 Stender Way, Santa Clara, Ge^ Califomia 95054. The invention is not limited to the 
use of this switch and another type or model of InfiniBand INFINIBAND switch may be used 
and be within the scope of the invention. 



[0022] The plurality of links 115 can use, for example and without limitation, 100 ohm 
differential transmit and receive pairs per channel. Each channel can use high-speed 
serialization/deserialization (SERDES) and 8b/10b encoding. 

[0023] In network terminology, admissible traffic patterns are traffic pattems in an InfiniBand 
INFINIBAND network where the traffic entering the InfiniBand INFINIBAND network does 
not exceed the InfiniBand INFINIBAND network's ability to output traffic. Interference in a 
network occurs when competing traffic sources attempt to use the same network resources at the 
same time. This can result in a degradation of the sustained rate of data transfer which one or 
more of the sources can maintain. It will either result in an increased latency or packet loss. In a 
network operating using InfiniBand INFINIBAND , link flow control algorithms guarantee that 
short-term congestion will not result in packet loss. Therefore, in a network operating using 
InfiniBand INFINIBAND , short-term congestion will manifest itself as increased data transfer 
latency. 

[0024] A non-interfering network (i.e. a network without interference) is a network for which the 
performance degradation for any admissible traffic pattern is guaranteed to conform to a pre- 
specified bound. This bound can be either deterministic or statistical. For example, a network can 
be deemed non-interfering if the worst-case end-to-end latency is guaranteed to be less than ten 
microseconds. This is an example of a deterministic bound. As another example, a network can 
be deemed non-interfering if 99% of packets experience network latencies of less than two 
microseconds. This is an example of a statistical bound. These are just examples and are not 
limiting of the invention. The appropriate choice for a pre-specified bound is application 
specific, and a network supporting multiple applications can impose different bounds on 
performance on each traffic type. 

[0025] A strictly non-interfering network (SNIN) is a network for which the only queuing delays 
experienced by an admissible traffic pattern are attributable to the multiplexing of packets fi-om 
slow links onto a faster link whose aggregate bandwidth at least equals the sum of the 
bandwidths of the smaller links. In a SNIN, competing traffic sources do not attempt to use the 



same network resources at the same time. The implementation of a SNIN requires that resources 
be dedicated through the network in support of an active communication session. In order to 
accomplish this, non-blocking networks can be used. 

[0026] A network is non-blocking if it has adequate internal resources to carry out all possible 
admissible traffic patterns. There are different degrees of non-blocking performance based upon 
the sophistication of the control policy required to achieve non-blocking performance. 

[0027] Most network switching applications allow the establishment of new connections and the 
tear down of old ones. It is possible that for a network with a non-blocking topology, a new 
connection can be blocked due to poor or unfortunate assignment of previously established 
connections. A strictly non-blocking network is a network for which any new admissible 
connection may be accepted independent of the state of preexisting connections, or the policy 
used to reroute preexisting connections, without changing the routes of the preexisting 
connections. A crossbar network is an example of a strictly non-blocking network. As another 
example, a rearrangably non-blocking network is a network that may be augmented by a 
mechanism to reroute preexisting connections such that it is possible to carry the preexisting 
connections and any new admissible connection. 

[0028] Another type of non-blocking network is a Gles CLOS network. €les CLOS networks are 
known in the art. For example, see "A Study of Non-Blocking Switching Networks" by Charles 
Clos, Bell System Technical Journal, 1953, vol. 32, no. 2, pp. 406-424. In an embodiment, €leiS 
CLOS networks can include FAT trees and K-nary arrays, other non-blocking networks, and the 
like. In an embodiment, network 100 is a des CLOS network 120. In an embodiment, des 
CLOS network 120 can be a two stage hierarchical network in which each node in the first stage 
connects to each node in the second stage through a plurality of links 115. In the embodiment 
shown in FIG. 1, first stage InfiniBand INFINIBAND switches 116 can be considered the first 
stage and second stage InfiniBand INFINIBAND switches 118 can be considered the second 
stage. 



[0029] As an illustration of an embodiment of the invention, traffic can traverse network 100. 



Traffic (i.e. a packet) originating at end node 122 can enter InfiniBand INFINIBAND switch 106 
through an end-node port 112, passes through an internal switch link. The packet proceeds to one 
of second stage InfiniBand INFINIBAND switches 118, for example InfiniBand INFINIBAND 
switch 102, via one of plurality of links 115 (where plurality of links 115 are bi-directional). The 
packet crosses through internal switch Unk at InfiniBand INFINIBAND switch 102, and back to 
one of first stage InfiniBand INFINIBAND switches 1 16, for example InfiniBand INFINIBAND 
switch 108, via one of plurality of links 115. The packet can then proceed to an end node coupled 
to InfiniBand INFINIBAND switch 108, for example end node 126. 

[0030] Although only one of pluraUty of Unks 1 15 is shown between each first stage InfiniBand 
INFINIBAND switches 116 and second stage InfiniBand INFINIBAND switches 118, the 
invention is not limited to only one link. In other embodiments there can be more than one of 
plurality of links 115 between each of first stage InfiniBand INFINIBAND switches 116 and 
each of second stage InfiniBand INFINIBAND switches 118. 

[0031] The number of plurality of links 115 between each pairing of first stage InfiniBand 
INFINIBAND switches 1 16 and second stage InfiniBand INFINIBAND switches 1 18 compared 
to the number of end-node ports on each of first stage InfiniBand INFINIBAND switches 116 
determines the degree of blocking potentially experienced by traffic crossing €les CLOS 
network 120. For example, if the number of second stage InfiniBand INFINIBAND switches 118 
is greater than or equal to the number of end node ports 112 on a first stage InfiniBand 
INFINIBAND switch 116, then €les CLOS network 120 is a rearrangably non-blocking €leiS 
CLOS network. As explained above, network 100 is non-blocking if it has adequate internal 
resources to carry out all admissible traffic patterns. As another example, des CLOS network 
120 is strictly non-blocking if the number of second stage InfiniBand INFINIBAND switches 
118 is equal to or greater than 2*(number of end-node ports 1 12)-1. 

[0032] Although FIG. 1 depicts a two stage hierarchical network, which can be a deiS CLOS 
network 120, this is not limiting of the invention. Network 100, and Cles CLOS network 120 can 
have any number of hierarchical stages and be within the scope of the invention. In other words, 
multistage networks and multistage deiS CLOS networks are within the scope of the invention. 



[0033] Although FIG. 1 depicts three first stage InfiniBand INFINIBAND switches 116, 
specifically, InfiniBand INFINIBAND switches 106, 108, 1 10, and two second stage InfiniBand 
INFINIBAND switches 118, specifically InfiniBand INFINIBAND switches 102, 104, any 
number of first stage InfiniBand INFINIBAND switches 116 and second stage InfiniBand 
INFINIBAND switches 1 18 are within the scope of the invention. Also, any number of end-node 
ports 112 are within the scope of the invention. Further, any number of switch interlink ports 
coupling InfiniBand INFINIBAND switches to each other via plurality of links 115 are within 
the scope of the invention. Still further, any number of plurality of end nodes 1 14 are within the 
scope of the invention. 

[0034] FIG. 2 depicts a network 200 according to another embodiment of the invention. As 
shown in FIG. 2, network 200 includes first stage InfiniBand INFINIBAND switches 216 
coupled to second stage InfiniBand INFINIBAND switches 218 via plurality of links. In an 
embodiment, network 200 can be a €les CLOS network 220 since each node in the first stage 
connects to each node in the second stage. In an embodiment, each of plurality of first stage 
InfiniBand INFINIBAND switches 216 can be coupled to one or more of plurality of end nodes 
(not shown for clarity), via plurality of end node ports. For example, InfiniBand INFINIBAND 
switch 210 can comprise plurality of end node ports 252, InfiniBand INFINIBAND switch 211 
can comprise plurality of end node ports 254, InfiniBand INFINIBAND switch 212 can comprise 
plurality of end node ports 256, and InfiniBand INFINIBAND switch 213 can comprise plurality 
of end node ports 258. 

[0035] In the embodiment, depicted in FIG. 2, second stage InfiniBand INFINIBAND switches 
218 include InfiniBand INFINIBAND switch 202, 204, 206, 208. In network 200, particularly in 
€les CLOS network 220, the stage of InfiniBand INFINIBAND switches furthest fi-om plurality 
of end nodes are referred to as spine nodes. In the embodiment depicted in FIG. 2, second stage 
InfiniBand INFINIBAND switches 218 can be considered spine nodes. Therefore, in this 
embodiment, each InfiniBand INFINIBAND switch 202, 204, 206, 208 is a spine node. 



[0036] A spanning tree is any group of nodes and links, (where nodes can be InfiniBand 



INFINIBAND switches, end nodes, and the like), containing is a unique path between every pair 
of nodes in the network. A routing tree is a spanning tree that is rooted at a spine node that 
defines the shortest path tree from the spine node to each end node. 

[0037] In network 200, there is a routing tree for each of second stage InfiniBand INFINIBAND 
switches 218. In an embodiment, routing tree 230 includes InfiniBand INFINIBAND switch 202, 
which is a spine node, and associated links to each of first stage InfiniBand INFINIBAND 
switches 216 and associated inter-switch links through each of first stage InfiniBand 
INFINIBAND switches 216 to each of plurality of end node ports 252, 254, 256, 258. 

[0038] In an embodiment, routing tree 232 includes InfiniBand INFINIBAND switch 204, which 
is a spine node, and associated links 225 to each of first stage InfiniBand INFINIBAND switches 
216 and associated inter-switch links through each of first stage InfiniBand INFINIBAND 
switches 216 to each of plurality of end node ports 252, 254, 256, 258. 

[0039] In an embodiment, routing tree 234 includes InfiniBand INFINIBAND switch 206, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 2 1 6 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
216 to each of plurality of end node ports 252, 254, 256, 258. 

[0040] In an embodiment, routing tree 236 includes InfiniBand INFINIBAND switch 208, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 216 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
216 to each of plurality of end node ports 252, 254, 256, 258. 

[0041] In an illustration of an embodiment, a packet created at an end node coupled to 
InfiniBand INFINIBAND switch 210 can traverse a path 225. Packet can enter InfiniBand 
INFINIBAND switch 210 via end node port 221, traverse inter-switch link 229, continue on a 
link to InfiniBand INFINIBAND switch 202, traverse inter-switch link 227; travel to InfiniBand 
INFINIBAND switch 211, traverse inter-switch link 231, out end node port 223 to another end 
node. In this embodiment, the packet travels path 225 between an end node coupled to 



InfiniBand INFINIBAND switch 210 and an end node coupled to InfiniBand INFINIBAND 
switch 21 1 . In an embodiment; path 225 is a shortest path 225 between spine node 202 and each 
of plurality of end nodes. In this embodiment, the packet traveled from a source to a destination 
using routing tree 230. As is known in the art, each of destinations in network 200 operating 
using InfiniBand INFINIBAND has a Base Local Identifier, known as a BaseLID 237, which is 
analogous to an address of the destination. 

[0042] In this embodiment, any packet created at a source needs a BaseLID of the destination 
and a routing tree to define the path to define a unique path fi-om the source to the destination. In 

an embodiment, the sum of the BaseLID and the routing tree (which can be, for example, a 
routing tree ID) can be a Destination Local Identifier (DLID). DLID includes the destination port 
(as designated by BaseLID) and the path to get there from the source, where the path is identified 
by, for example and without limitation, a routing tree ID. 

[0043] In an embodiment, network 200, can be a €les CLOS network 220, and also a 
rearrangably non-blocking des CLOS network since the number of second stage InfiniBand 
INFINIBAND switches 218 is greater than or equal to the number of end node ports on a first 
stage InfiniBand INFINIBAND switch 216. In another embodiment, network 200 can be a 
strictly non-blocking Gles CLOS network since the number of second stage InfiniBand 
INFINIBAND switches 218 equal to or greater than 2* (number of end node ports on a first stage 
InfiniBand INFINIBAND switch 216)-1. In an embodiment, traffic in network 200 can be 
scheduled such that the only queuing delays experienced by an admissible traffic pattern are 
attributable to the multiplexing of packets fi-om slow links onto a faster link whose aggregate 
bandwidth at least equals the sum of the bandwidths of the smaller links. In this embodiment, 
competing traffic sources do not attempt to use the same network resources at the same time. As 
defined above, network 200 can then be a SNIN 219. 

[0044] FIG. 3 depicts a network 300 according to yet another embodiment of the invention. As 
shown in FIG. 3, network 300 includes first stage InfiniBand INFINIBAND switches 350 
coupled to second stage InfiniBand INFINIBAND switches 318 via plurality of links. In an 
embodiment, each of plurality of first stage InfiniBand INFINIBAND switches 350 can be 



coupled to one or more of plurality of end nodes (not shown for clarity), via plurality of end node 
ports. For example, InfiniBand INFINIBAND switch 310 can comprise plurality of end node 
ports 352, InfiniBand INFINIBAND switch 311 can comprise plurality of end node ports 354, 
InfiniBand INFINIBAND switch 312 can comprise plurality of end node ports 356, and 
InfiniBand INFINIBAND switch 313 can comprise plurality of end node ports 358. 

[0045] As is known in the art, a dilated network is one in which the total bandwidth between at 
least one pair of switches is greater than the bandwidth of a link connecting a switch to an end 
node. In an embodiment, network 300 can be a dilated network as there are two links between 
each of first stage InfiniBand INFINIBAND switches 350 and second stage InfiniBand 
INFINIBAND switches 318. Dilated networks are significant because they allow the cost- 
effective construction of non-blocking networks. Dilated network are also significant when links 
of differing speeds are used in the network. 

[0046] In an embodiment, network 300 is equivalent to network 200, where network 300 is 
dilated. Therefore, network 300 is also a €les CLOS network 320. Network 300 is more cost- 
effective as only two second stage InfiniBand INFINIBAND switches 318 are required. As 
is known the art of networking, equivalence can be shown between network 300 and network 
200. Equivalence allows a path in network 300 to be mapped back to a path in network 200, such 
that non-interfering traffic flows remain non-interfering. Any admissible set of connections can 
be carried by either of network 200 or network 300. Therefore, a dilated network such as 
network 300 can carry any set of connections that network 200 can. Therefore, network 300 can 
be rearrangably non-blocking Gim CLOS network, a strictly non-blocking Gies CLOS network 
and/or a SNIN 3 19 as was shown with reference to network 200. 

[0047] In the embodiment, depicted in FIG. 3, second stage InfiniBand INFINIBAND switches 
318 include InfiniBand INFINIBAND switch switches 302, 304. In network 300, particularly in 
Cles CLOS network 320, the stage of InfiniBand INFINIBAND switches fiirthest from plurality 
of end nodes are referred to as spine nodes. In the embodiment depicted in FIG. 3, second stage 
InfiniBand INFINIBAND switches 318 can be considered spine nodes. Therefore, in this 
embodiment, each InfiniBand switch of INFINIBAND switches 302, 304 is a spine node. In 



network 300, there may be multiple shortest paths between a spine node and an end node. A 
generalization can be made from the non-dilated case shown in FIG. 2 by defining a routing tree 
in such a way that is sufficient to cover all the paths for a routing tree between a spine node and 

the plurality of end nodes. 

[0048] In network 300, there are two routing trees for each of second stage InfiniBand 
INFINIBAND switches 318. In an embodiment, routing tree 330 includes InfiniBand 
INFINIBAND switch 302, which is a spine node, and associated links to each of first stage 
InfiniBand INFINIBAND switches 350 and associated inter-switch links through each of first 
stage InfiniBand INFINIBAND switches 350 to each of plurality of end node-ports 352, 354, 
356,358. 

[0049] In an embodiment, routing tree 332 includes InfiniBand INFINIBAND switch 302, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 350 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
350 to each of plurality of end node ports 352, 354, 356, 358. 

[0050] In an embodiment, routing tree 334 includes y switch 304, which is a spine node, and 
associated links to each of first stage InfiniBand INFINIBAND switches 350 and associated 
inter-switch links through each of first stage InfiniBand INFINIBAND switches 350 to each of 
plurality of end node ports 352, 354, 356, 358. 

[0051] In an embodiment, routing tree 336 includes InfiniBand INFINIBAND switch 304, which 
is a spine node, and associated links to each of first stage InfiniBand INFINIBAND switches 350 
and associated inter-switch links through each of first stage InfiniBand INFINIBAND switches 
350 to each of plurality of end node ports 352, 354, 356, 358. 

[0052] In an illustration of an embodiment, a packet created at end node coupled to InfiniBand 
INFINIBAND switch 312 can traverse a path to an end node coupled to InfiniBand 
INFINIBAND switch 314. The packet Packet can enter InfiniBand INFINIBAND switch 3 12 via 
end node port 321, traverse inter-switch link 329, continue on a link (using routing tree 334) to 



InfiniBand INFINIBAND switch 304, traverse inter-switch link 327, travel to InfiniBand 
INFINIBAND switch 314, traverse inter-switch link 331, out end node port 323 to another end 
node. In this embodiment, the packet travels the path between an end node coupled to InfiniBand 
INFINIBAND switch 312 and an end node coupled to InfiniBand INFINIBAND switch 314. In 
an embodiment, the path is a shortest path between spine node 304 and each of plurality of end 
nodes. In this embodiment, the packet traveled from a source to a destination using routing tree 
334. As is known in the art, each of destinations in network 300 operating using InfiniBand 
INFINIBAND has a BaseLID. The sum of the BaseLID and the routing tree (which can be, for 
example, a routing tree ID) can be a DLID analogous to that described above with reference to 
FIG. 2. 

[0053] FIG. 4 depicts a block diagram of a network 400 according to an embodiment of the 
invention. Network 400 includes a path determination mechanism that programs forwarding 
tables of InfiniBand INFINIBAND switches with paths appropriate to make network 400 operate 
as a SNIN 419. As shown in FIG. 4, network 400 can include one or more end nodes 406, which 
are representative of plurality of end nodes 114 shown in FIG. 1 and referred to in FIG. 2 and 
FIG. 3. End node 406 can be coupled to a connection controller 402, which is in turn coupled to 
master subnet manager 404. Master subnet manager 404 is also coupled to each of one or more 
InfiniBand INFINIBAND switches 401, which represents any of InfiniBand INFINIBAND 
switches referred to in FIGS. 1-3. 

[0054] Network 400 . when operating using InfiniBand INFINIBAND, has one master subnet 
manager 404, which can reside on a port, InfiniBand INFINIBAND switch, router, end node, and 
the like. In another embodiment, master subnet manager 404 can be distributed among any 
number of InfiniBand INFINIBAND switches, end nodes and ports. Master subnet manager 404 
can be implemented in hardware or software. When there are multiple subnet managers in 
network 400, one subnet manager will include master subnet manager 404 and any other subnet 
managers within network 400 may become a standby subnet manager. 

[0055] In an embodiment, master subnet manager 404 manages network 400 and can initialize 
and configure network 400. This can include discovering a topology of network 400, establishing 



possible paths among InfiniBand INFINIBAND switches and end nodes, assigning local 
identifiers to each port in network 400, sweeping the network and discovering and managing 
changes in topology of network 400, and the like. In the realm of InfiniBand INFINIBAND. 

network 400 can be considered a subnet. 

[0056] In an embodiment, master subnet manager 404 can include network topology data 405, 
which contains data on network 400 and all paths, InfiniBand INFINIBAND switches, end 
nodes, links, and the like. Master subnet manager 404 can also include an SNIN policy entity, 
which can be a mechanism to specify whether the policy of operating network 400 as an SNIN is 
in effect. 

[0057] In an embodiment, connection controller 402 can be a software entity responsible for 
receiving a requested traffic pattem 403 from one or more end nodes 406, routing connections in 
network 400 in a non-interfering fashion and conveying routing information to respective end 
nodes. In other words, connection controller 402 can receive connection requests from end nodes 
and amalgamate them to form requested traffic pattem 403. In an embodiment, connection 
controller 402 can also communicate with master subnet manager 404 to pre-program end nodes 
in a way that is consistent with non-interfering operation of network 400. In an embodiment, 
connection controller 402 can reside on a port, InfiniBand INFINIBAND switch, router, end 
node, and the like. In another embodiment, connection controller 402 can be distributed among 
any number of InfiniBand INFINIBAND switches, end nodes and ports. 

[0058] In an embodiment, connection controller 402 can include network topology cache 418, 
which maintains a local representation network topology data 405. In other words, network 
topology cache 418 can maintain a local representation of master subnet manager's 404 view of 
network topology data 405, including paths established between InfiniBand INFINIBAND 
switches, end nodes, and the like, of network 400. Connection controller 402 can also include 
logical traffic pattem cache 416, which is responsible for storing requested traffic pattem 403 
received from one or more of end nodes 406. 



[0059] Connection controller 402 can also include packing algorithm 414, which can combine 



requested traffic pattern 403 with network topology data 405 stored in network topology cache 
418 to calculate actual traffic pattern 412. In an embodiment, actual traffic pattern 412 can 
include the set of paths that each packet in requested traffic pattern is to use in order to achieve 
non-interfering operation of network 400. Logical network state entity 420 stores actual traffic 
pattern 412 fi-om packing algorithm 414 and communicates actual traffic pattern 412 to sources 
at each end node 406 included in requested traffic pattern 403. 

[0060] Packing algorithm 414 can include rearrangement algorithm 409. In an embodiment, 
rearrangement algorithm 409 can identify how to rearrange a network so as to allow the 
admission of a new admissible connection in a non-interfering fashion. In an embodiment, the 
input to rearrangement algorithm can be a Pauii PAULL matrix representing a non-interfering 
network state and a request to establish a new connection. The output of rearrangement 
algorithm can be a new PauU PAULL matrix representing a non-interfering network state in 
which the new connection is carried in addition to the pre-existent connections. An example of 
an embodiment of rearrangement algorithm 409 is Hui's HUI's rearrangement algorithm. It is 
desired to be understood that Hui's HUI's rearrangement algorithm is merely exemplary and that 
other rearrangement algorithms are included in the scope of the invention. 

[0061] In some networks, such as Folded Networks, rearrangement algorithm 409 can find a path 
for an admissible traffic pattern. However, the resulting path may have loops in it. After 
determining the path for all connections, but prior to having instantiated the paths, each path can 
be independently pruned to remove any loops. 

[0062] In a €les CLOS network, the tuple [[of]] (source, destination, spine node) uniquely 
identifies every path that could potentially be selected as a consequence of rearrangement 
algorithm 409. The tuple (source, destination, spine node) defines the path obtained by applying 
loop removal to the path obtained by taking the shortest path fi-om source to spine node followed 
by shortest path from spine node to destination. As described above, routing tree is a shortest- 
path spanning tree rooted at one of the spine nodes. The tuple (source, destination, routing tree) 
identifies a loop-less shortest path from source to destination contained entirely within the 
routing tree. This identification of a path is unique in network 400. The identification of a 



minimally sufficient set of routing trees to support rearrangement algorithm 409 allows 
programming of InfiniBand INFINIBAND switch forwarding tables and enables the realization 
of network 400 as a SNIN 419. This is discussed further below. 

[0063] Network 400 can include end node 406. End node 406 is representative of plurality of end 
nodes 114 shown in FIG. 1 and referred to in FIG. 2 and FIG. 3. End node 406 can include 
process 426, which can be a user process that wishes to connect with network 400, in particular 
SNIN 419. Process 426 can be a program, job, and the like, contained in memory on end node 
406 and controlled by a processor (not shown) on end node 406. End node 406^ when operating 
using InfiniBand INFINIBAND, can include queue pair 424, which represents one half (either 
receive or transmit) of an InfiniBand INFINIBAND communications process. Queue pair 424 is 
known in the art. End node 406 can include QP mesh manager, which can be a software entity 
responsible for maintaining multiple queue pairs existent on end node 406, communicating with 
logical network state entity 420 to receive actual traffic pattern 412 pertaining to packet 408 
created at end node 406, and informing end node (as a source) which queue pair to use at any 
given instant in time. 

[0064] Network 400 can include InfiniBand INFINIBAND switch 401, which represents any of 
InfiniBand INFINIBAND switches referred to in FIGS. 1-3. InfiniBand INFINIBAND switch 
401 can include forwarding table 415 to store, in one embodiment, set of forwarding instructions 
413 and plurality of DLIDs 410. As discussed above, DLID comprises a BaseLID and reference 
to a routing tree (routing tree ID). In an embodiment, a packet 408 with a DLID 421 in the 
packet header 411, created at end node 406 acting as a source, enters InfiniBand INFINIBAND 
switch 401. DLID 421 is looked up in forwarding table 415 to find corresponding one of 
plurality of DLIDs 410. Packet 408 is then forwarded toward a destination based on the set of 
forwarding instructions 413 corresponding to DLID 421. 

[0065] In an embodiment, when network 400 is initialized, or when network 400 has a topology 
change, forwarding table 415 of each InfiniBand INFINIBAND switch 401 can be populated 
with plurality of DLIDs 410 and set of forwarding instructions 413 such that network operates as 
a SNIN 419 if SNIN policy is in effect per SNIN policy entity 407. This can begin with 



connection contioUer 402 calculating a plurality of routing tiees for the plurality of < InfiniBand 
INFINIBAND switches in network 400. Connection controller 402 can receive the topology of 
network 400 (network topology data 405) from master subnet manager 404 as described above. 
A plurality Plurality of routing trees can be calculated based on each spine node in a Gim CLOS 
network as described with reference to FIGS. 2 and 3. 

[0066] Thereafter, a plurality of DLIDs 410 and a set of forwarding instructions 413 for each 
InfiniBand INFINIBAND switch 401 can be calculated where each of the plurality of DLIDs 410 
corresponds to one of the routing tiees of which InfiniBand INFINIBAND switch 401 is part and 
one of a plurality of destinations as referenced by a BaseLlD. In an embodiment, calculating the 
plurality of routing trees includes, for each spine node, calculating a shortest path from the spine 
node to each of a plurality of sources and a plurality of destinations. The plurality of routing trees 
include at least a portion of the plurality of InfiniBand INFINIBAND switches in network 400 
and the corresponding plurality of links that form a shortest path from at least one of the plurality 
of sources or one of the plurality of destinations to the spine node of network 400. The addition 
of a routing free (routing free ID) to a BaseLID produces a DLID for a given destination. 
Forwarding table 415 will only use the links associated with the routing free for that particular 
DLID. 

[0067] In an embodiment, forwarding table 415 can be populated as each DLID and set of 
forwarding instructions is calculated. In another embodiment, each DLID and set of forwarding 
instructions can be sent to InfiniBand INFINIBAND switch 401 after the plurality of DLIDs 410 
and set of forwarding instructions 413 are calculated for each of plurality of InfiniBand 
INFINIBAND switches in network 400. 

[0068] Once forwarding table 415 is populated at each of InfiniBand INFINIBAND switches 
401 in network 400, connection contioUer 402 and master subnet manager 404 can be coupled to 
operate network 400 as a SNIN 419. Packet 408 can be created at one of a plurality of sources, 
where the one of the plurality of sources can be located at end node 406. Packet 408 has a 
destination as defined by a BaseLID of a destination in network 400. In a given time window, 
each source can submit to connection contioUer 402 the destination where it wants to send a 



packet. The sum of all of these requests by a plurality of sources can be requested traffic pattern 
403. Connection controller 402, in particular packing algorithm 414, runs rearrangement 
algorithm 409 for network 400 and computes actual traffic pattem 412 using requested traffic 
pattern 403 and network topology data 405, such that network 400 operates as a SNIN 419. 
Connection controller 402 then has logical network state entity 420 communicate actual traffic 
pattem 412 to the source at end node 406 corresponding to packet 408. Actual traffic pattem 412 
can comprise a DLID 421 assigned to packet 408 such that network 400 operates as a SNIN 419. 
QP mesh manager 422 at end node 406 can then assign a specific queue pair corresponding to 
the DLID 421. 

[0069] In the given time window, once connection controller 402 has assigned DLlDs to all of 
the packets corresponding to the requested traffic pattern 403, packet 408 follows a path through 
at least a portion of plurality of InfiniBand INFINIBAND switches 401 toward its destination. 
Time window, can be for example and without limitation, 1/60*'' of a second. Each of th e portion 
of the plurality of InfiniBand INFINIBAND switches forwards the packet 408 according to the 
DLID 421 assigned to the packet 408. When packet 408 arrives at InfiniBand INFINIBAND 
switch 401, the DLID 421 in packet header 41 1 is looked up in forwarding table 415. DLID 421 
is matched with one of the plurality of DLIDs 410 in forwarding table 415 and packet 408 is 
forwarded out of a port on InfiniBand INFINIBAND switch 401 to another InfiniBand 
INFINIBAND switch according to set of forwarding instructions 413 corresponding to the one of 
the plurality of DLIDs 410 matching the DLID 421 in packet header 411. The packet will follow 
only the links designated in the routing tree corresponding to the DLID 421 assigned to the 
packet. This is repeated at each [[of]] portion of the plurality of InfiniBand INFINIBAND 
switches until packet 408 reaches its destination end node. The process can be repeated for each 
subsequent time window as long as network 400 is in operation. In another embodiment, each 
source can tell connection controller 402 that it wants to operate during a given time fi-ame. In 
this embodiment, this data can be requested traffic pattem 403 and connection controller 402 can 
compute actual traffic pattem so that network 400 operates as SNIN 419. 

[0070] The above process of populating forwarding tables of InfiniBand INFINIBAND switches 
with paths appropriate to make network 400 operate as a SNIN 419 works particularly well for a 



€les CLOS network. However, as a €4es CLOS network is instantiated, it is unlikely that all 
InfiniBand INFINIBAND switches will be turned en "ON" simultaneously. As such network 
400 can pass through states in which it is not a Oos CLOS network. Therefore the above 
methodology can be implemented in a non Clos non-CLOS network as well, where the 
populating of forwarding tables occurs after each change in topology of network 400. 

[0071] FIG. 5 illustrates a flow diagram 500 of a method of the invention according to an 
embodiment of the invention. In step 502, a plurality of routing trees are calculated for a plurality 
of InfiniBand INFINIBAND switches in a network. In an embodiment, calculating the plurality 

of routing trees comprises for each spine node in the network, calculating a shortest path from 
the spine node to each of the plurality of sources and each of the plurality of destinations. In an 
embodiment, the network is a Gim CLOS network. Each of the plurality of routing trees can 
comprise at least a portion of the plurality of InfiniBand INFINIBAND switches and 
corresponding plurality of links that form a shortest path fi-om one of the plurality of sources or 
one of the plurality of destinations to a spine node of the €les CLOS network. 

[0072] In step 504, a plurality of DLIDs and a set of forwarding instructions are calculated for 
each of the plurality of InfiniBand INFINIBAND switches, wherein each of the plurality of 
DLIDs corresponds to one of the plurality of routing trees and one of a plurality of destinations. 
In step 506, a forwarding table of each of the plurality of InfiniBand INFINIBAND switches in 
the Gim CLOS network is populated with the plurality of DLIDs and the set of forwarding 
instructions. 

[0073] FIG. 6 illustrates a flow diagram 600 of a method of the invention according to another 
embodiment of the invention. In an embodiment, the method illustrated in FIG. 6 illustrates one 
embodiment for calculating a plurality of routing trees from a plurality of spanning trees and 
programming and populating forwarding tables at a plurality of InfiniBand INFINIBAND 
switches with DLIDs and corresponding sets of forwarding instructions such that a network can 
operate as a SNIN. The method is particularly suited to, but not limited to, rearrangably, non- 
blocking, multistage Gim CLOS networks. 



[0074] In step 602, a plurality of end nodes, InfiniBand INFINIBAND switches and links define 
a plurality of spanning trees. In step 604, one of the plurality of spanning trees is selected as the 
current spanning tree. In step 606, one of the plurality of end nodes is selected as the current end 
node. In step 608, one of the plurality of InfiniBand INFINIBAND (IB in FIGS. 6 and 7) 
switches is selected as the current InfiniBand INFINIBAND switch. 

[0075] In step 610, the current DLID is calculated to be the BaseLID of the current end node 
plus the tree ID of the current spanning tree. In step 612, the current outgoing port is set equal to 
the outgoing port from the current InfiniBand INFINIBAND switch which moves a packet closer 
to the current end node, given that only links in the current spanning tree can be used. Step 614 
represents one embodiment of the invention that includes populating the current InfiniBand 
INFINIBAND switch's forwarding tables such that the current InfiniBand INFINIBAND switch 
forwards packets with the DLID equaling the current DLID, on an outgoing port equal to the 
current outgoing port. In another embodiment, of the invention, step 614 is not included and an 
additional step at the end of the flow diagram in FIG. 6 is included to populate the forwarding 
tables with a plurality of DLIDs and a-set of forwarding instructions. In other words, in this 
alternate embodiment, the forwarding tables are populated only after the plurality of routing 
trees, plurality of DLIDs and set of forwarding instructions are all calculated. 

[0076] In step 616, it is determined if the current InfiniBand INFINIBAND switch is the last of 
the plurality of InfiniBand INFINIBAND switches. If not, the current InfiniBand INFINIBAND 
switch is set equal to the next of the plurality of InfiniBand INFINIBAND switches per step 618 
and the process returns to step 610. This process repeats until, in step 616, the current InfiniBand 
INFINIBAND switch is the last of the plurality of InfiniBand INFINIBAND switches, at which 
time the process moves to step 620. In other words, for a given spanning tree and a given end 
node, each InfiniBand INFINIBAND switch in the network is processed per steps 610-614. 

[0077] In step 620, it is determined if the current end node is the last of the plurality of end 
nodes. If not, the current end node is set equal to the next of the plurality of end nodes per step 
622 and the process retums to step 608. This process repeats until, in step 620, the current end 



node is the last of the plurality of end nodes, at which time the process moves to step 620. In 
other words, for a given spanning tree, each end node in the network is processed per steps 610- 
614. 

[0078] In step 624, it is determined if the current spanning tree is the last of the plurality of 
spanning trees. If not, the current spanning tree is set equal to the next of the plurality of 
spanning trees per step 626 and the process returns to step 606. This process repeats until, in step 
624, the current spanning tree is the last of the plurality of spanning trees, at which time the 
process of FIG. 6 is completed. At the completion of the process of FIG. 6, the forwarding tables 
of each of the plurality of InfiniBand INFINIBAND switches is populated with a plurality of 
DLIDs and the set of forwarding instructions such that a packet arriving at an InfiniBand 
INFINIBAND switch can be forwarded such that the network operates as a SNIN. 

[0079] FIG. 7 illustrates a flow diagram of a method of the invention according to yet another 
embodiment of the invention. In step 702, a packet is created at a source in a network, wherein 
the packet is addressed to a destination. Step 704 includes executing a rearrangement algorithm 
for the network. Step 706 includes assigning one of a plurality of DLIDs to the packet. Step 708 
includes the packet following a path through at least a portion of a plurality of InfiniBand 
INFINIBAND switches from the one of the plurality of sources to the one of the plurality of 
destinations, wherein each of the portion of the plurality of InfiniBand INFINIBAND switches 
forward the packet according to the one of the plurality of DLIDs assigned to the packet. Step 
708 includes looking up the one of the plurality of DLIDs assigned to the packet in the 
forwarding table at each of th e portion of the plurality of InfiniBand INFINIBAND switches 
along the path from the source to the destination. In other words, each of the portion of the 
plurality of InfiniBand INFINIBAND switches forwards the packet in accordance with the one 
of the plurality of DLIDs assigned to the packet as found in the forwarding table at each the 
portion of the plurality of InfiniBand INFINIBAND switches. 

[0080] While we have shown and described specific embodiments of the present invention, 
fiirther modifications and improvements will occur to those skilled in the art. It is therefore, to be 



understood that appended claims are intended to cover all such modifications and changes as fall 
within the true spirit and scope of the invention. 



